Feature/validator.ValidateMulti() by freemans13 · Pull Request #405 · bsv-blockchain/teranode

freemans13 · 2026-01-15T23:58:42Z

No description provided.

Fixes chain validation failure when syncing more than 10,000 blocks. The bug occurred because GetBlockHeadersFromOldest includes the starting block (uses >= in SQL), causing duplicate headers at iteration boundaries. When catchup spans multiple iterations (every 10,000 blocks), the last header from iteration N becomes the first header in iteration N+1, creating a duplicate that breaks chain validation. Changes: - Skip first header in iterations 2+ if it matches the last header from the previous iteration - Handle edge case where all headers are duplicates (chain tip reached) - Add debug logging for duplicate detection Tests: - Add comprehensive multi-iteration tests covering 10K, 25K header scenarios - Test edge cases (single duplicate header, exact boundary conditions) - Verify chain continuity across iteration boundaries - Validate header chain cache builds correctly with multi-iteration headers All existing tests pass. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

github-actions · 2026-01-15T23:59:32Z

🤖 Claude Code Review

Status: Complete

Summary:

This PR introduces significant architectural changes: a custom worker pool for subtree validation and concurrent catchup sessions with sequential validation ordering. The implementation is generally sound, but several areas need attention.

Key Issues:

Worker pool error handling - Returns first error only, losing visibility into multiple validation failures. Consider collecting all errors for better debugging.
Aggressive worker count (64x CPU cores) - Defaults to 512 workers on 8-core systems without benchmark justification. Previous implementation used different concurrency calculation - needs performance data to validate this change.
Semaphore test coverage - Missing tests for critical concurrent validation behavior (blocking, error paths, cancellation while waiting).
Aerospike config changes - 4x-5x increases to cache sizes and delays appear unrelated to PR title. Should be separated or justified.
Worker pool cleanup methods - Both Close() and Shutdown() exist but only one is used. Needs documentation or consolidation.

See inline comments for details. The core concurrent catchup session management looks well-designed with proper session tracking and metrics.

…o feature/validator-validate-multiple

…e then spend)

github-actions · 2026-01-19T17:32:35Z

model/Block.go

+	for _, result := range pool.results {
+		if result.err != nil {
+			// Do not wrap the error again, the error is already wrapped
+			return result.err


Critical: Early return on first error loses other validation errors

The current implementation returns immediately on the first subtree validation error found. This means if multiple subtrees have validation errors, only the first one discovered (which could be any due to concurrent execution) will be reported.

This makes debugging harder because developers won't see all the validation failures at once. Consider either:

Collecting all errors and returning a combined error

Documenting this "fail-fast" behavior explicitly in comments

The previous errgroup implementation had the same behavior, but it's worth considering if this is the desired outcome for block validation.

github-actions · 2026-01-19T17:32:49Z

model/block_worker_pool.go

+// Close closes the job channel and waits for all workers to finish
+func (p *subtreeWorkerPool) Close() {
+	close(p.jobs)
+	p.wg.Wait()


Potential resource leak: Context cancellation without cleanup

The Shutdown() method cancels the context and closes the channel, but Close() just closes the channel without canceling. If a worker is blocked on p.ctx.Done() and the caller uses Close() instead of Shutdown(), the context will never be canceled.

However, I see this is only called in one place (Block.go:700) and it uses Close(). Since the parent context passed to the pool should handle cancellation via the tracing/request lifecycle, this may be acceptable.

Consider either:

Documenting when to use Close() vs Shutdown()

Removing Shutdown() if it's never used

Always canceling the context in Close() for safety

github-actions · 2026-01-19T17:33:01Z

model/block_worker_pool.go

+	// Subtree validation is ~97% I/O (file reads from blob store)
+	// High concurrency needed to saturate disk I/O throughput
+	// On 8-core machine: 512 workers
+	numWorkers := runtime.GOMAXPROCS(0) * 64


Extremely high default worker count: 64x CPU cores

This defaults to 512 workers on an 8-core machine. While the comment justifies this for I/O-heavy operations, this is an unusually aggressive multiplier that could cause issues:

Each worker is a goroutine with its own stack (minimum 2KB, grows to 1GB)

512 concurrent blob store reads could overwhelm file descriptors, network connections, or disk I/O queues

The comment mentions "~97% I/O" but provides no benchmark data to support the 64x multiplier

Consider:

Adding benchmark results showing that 64x actually improves performance over lower values (8x, 16x, 32x)

Making this configurable via settings with a more conservative default (e.g., 16x)

Testing with actual production block sizes to ensure this doesn't cause resource exhaustion

The previous errgroup implementation used getValidationConcurrency() which appears to use a different calculation - was that measured to be insufficient?

github-actions · 2026-01-19T17:33:11Z

services/blockvalidation/catchup.go

+		// This blocks until the previous session's validation completes
+		u.logger.Infof("[catchup][%s][session:%s] waiting for validation semaphore", catchupCtx.blockUpTo.Hash().String(), catchupCtx.sessionID)
+		select {
+		case u.validationSemaphore <- struct{}{}:


Missing test coverage for validation semaphore blocking behavior

The validation semaphore is critical for ensuring sequential validation across concurrent catchup sessions, but I don't see tests that verify:

Multiple sessions can download concurrently (as intended)

Only one session can validate at a time (enforced by semaphore)

Sessions properly release the semaphore even when validation fails

Sessions handle context cancellation while waiting for the semaphore

The existing test TestCatchup_ConcurrentCatchupSessions only checks session registration/unregistration, not the actual semaphore blocking behavior. Consider adding integration tests that verify these concurrent behaviors.

github-actions · 2026-01-19T17:33:21Z

deploy/docker/base/aerospike.conf

        # can be dangerous if the instance crashes or the storage can't keep up at all
        # monitor the queue with `asadm -e "show statistics like write_q"`
-        max-write-cache 1024M
+        max-write-cache 4096M


Significant Aerospike tuning changes without justification

This PR changes several critical Aerospike performance parameters:

post-write-cache: 256 -> 1024 (4x increase)

defrag-sleep: 2000 -> 10000 (5x slower defragmentation)

flush-max-ms: 1000 -> 5000 (5x longer flush delay)

max-write-cache: 1024M -> 4096M (4x larger write buffer)

These changes significantly affect UTXO store write performance and memory usage. While the PR title mentions ValidateMulti(), the Aerospike tuning seems unrelated to that feature.

Questions:

Are these changes necessary for the worker pool/concurrent catchup features?

Were these settings benchmarked with production workloads?

Should these be in a separate PR focused on storage optimization?

What happens on systems with less available RAM when max-write-cache is 4GB?

icellan and others added 4 commits January 15, 2026 17:22

ValidateMultiple()

8cf46d1

optimisations

83e1826

optimisations

8c35a03

freemans13 self-assigned this Jan 15, 2026

performance tweaks

19692bc

freemans13 changed the title ~~Feature/validator validate multiple~~ Feature/validator.ValidateMulti() Jan 16, 2026

freemans13 added 6 commits January 16, 2026 17:51

performance tweaks

53bffdc

Merge branch 'main' of https://github.com/bsv-blockchain/teranode int…

d7fcb92

…o feature/validator-validate-multiple

weekend tweaks

b49be80

fast as I can get it doing spend before create (better would be creat…

4285936

…e then spend)

tweaks

582a5f1

tweaks

bb82c32

github-actions bot reviewed Jan 19, 2026

View reviewed changes

freemans13 marked this pull request as draft January 19, 2026 18:30

freemans13 added 5 commits January 20, 2026 17:20

test fix

f118cfd

test fixes

2ae9b2c

simplification

cfb0da5

settings sensible defaults

c79f76f

Merge from main

0c909ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/validator.ValidateMulti()#405

Feature/validator.ValidateMulti()#405
freemans13 wants to merge 16 commits intobsv-blockchain:mainfrom
freemans13:feature/validator-validate-multiple

freemans13 commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

github-actions bot Jan 19, 2026

Uh oh!

github-actions bot Jan 19, 2026

Uh oh!

github-actions bot Jan 19, 2026

Uh oh!

github-actions bot Jan 19, 2026

Uh oh!

github-actions bot Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

freemans13 commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 15, 2026 •

edited

Loading